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1 INTRODUCTION 



Ever since th e sem inal works of Sim kinl lll974t) and 
iTonrv fe David lll979l) . the cross-correlation technique to 
measure astronomical Doppler shifts has become extremely 
popular. The advent of digitized spectra and computers 
made it the preferred method. It has been applied in all 
the astronomical fields that require the measurement of ra- 
dial velocities from observed spectra, ranging from binary 
and multiple stellar systems to cosmology. In recent years, 
improvements in the precision of radial velocities measured 
through cross-correla tion led to the detectio n of many ex- 
trasolar planets (e.g., M ayor fc QueloJ [T995). 

The cross-correlation technique is conceptually simple 
and can be presented in an intuitive manner. Neverthe- 
less, its properties have been studied extensively, in an 
effort to improve its precision and overcome its few lim- 
itations. Thus, various met hods have been suggested to 
estimate its precision (e.g.. iTonry fc Davlsl Il979l : IConnesI 
Il985l: iMurdoch fc Hearnshawl ll99lT) . TODCOR - a Two- 
Dimensional Correlation technique, was introduced as a 
generalization of cross-correlation meant to measure the 
Dopp ler shifts of two blended spectra iZucker fc Mazehl 

Due to the progress in detector technology, the mod- 
ern spectrographs pose a new challenge to the technique, 
by producing multi-order spectra. In order to maximize the 
precision of the measured velocities, we need to combine 
the spectral information in all the relevant orders, poten- 
tially reducing the signal-to-noise (S/N) required to achieve 
a spec ified precision. One a pproa ch was suggested bv lConnesl 
(1985) and lBouchv et ahl l)200rf) . who calculated the cross- 
correlation for each order separately, and then calculated a 
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weighted average of t he result ing cross-correlatio n functions. 
The weights used bv lConnesl and iBouchv et all reflect their 



estimate of the S/N in the corresponding orders. 

This paper offers another approach to study the prop- 
erties of cross-correlation. It is shown that under certain 
assumptions, measuring the radial velocity through cross- 
correlation is equivalent to using maximum-likelihood anal- 
ysis for the measurement. This approach has been already 
used in co mmunication t heory in the context of signal detec- 
tion (e.g.. |Proakislll995l) . This approach leads in a natural 
way to an error estimate and to a new scheme for combining 
cross-correlation functions. This new scheme is shown, the- 
oretically and through simulations, to be superior over the 
other schemes proposed, both in terms of precision and in 
terms of signal detection capability. 

Section [5] shows how cross-correlation can be de- 
rived from maximum-likelihood theory. It derives an error- 
estimate and shows simulations to test this error estimate. 
The new combination scheme is derived from maximum- 
likelihood principles in Section |H] Several simulations to 
demonstrate the power of the technique are also presented 
in this Section. The paper is concluded by a few remarks in 
Section 0] 



2 FROM MAXIMUM LIKELIHOOD TO 
CROSS-CORRELATION 

2.1 The basic assumptions 

Let /(n) denote the observed spectrum, whose Doppler shift 
is to be found by correlating it against g(n) - the 'template' 
of zero shift. Both the stellar spectrum and the template are 
assumed to be described as functions of the bin number - n, 
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where n = A In A + B . Thus, the Do ppler shift results in a 
uniform linear shift of the spectrum llTbnrv fc Davislll979t) . 

Now let us introduce the statistical model. Under this 
model we assume that the observed spectrum was produced 
by multiplying the template by a scaling constant (do), shift- 
ing it (so bins) and adding a random white gaussian noise 
with a fixed standard deviation (<7o), i.e.: 

f(n) = a g(n - s ) + d n , 



d„ 



N(0,a 2 ) 



Obviously, ao an d So are not known in advance. In the litera- 
ture, o"o is usually assumed to be known, but in the following 
derivation we will assume no prior knowledge of ao- 

A few simplifying assumptions are usually applied, 
which are also used in this work. Thus, the spectra involved 
are assumed to be "continuum-subtracted", i.e. a best-fit 
low-order polynomial has been subtracted from the spectra. 
As a result, the spectra will have a zero mean: 
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n 



The number of bins is typically very large (usually in the 
order of thousands) and thus, for small shifts (even tens of 
bins), one can neglect edge effects and assume that for each 
shift, the overlap length is N - the total number of bins. 



2.2 Maximum-likelihood estimation 

The likelihood is defined by the probability of the observed 
results, under the assumed model, as a function of the model 
parameters. In our case: 



L = 



n 



V2na 2 

N 



exp 



1 



v 7 ^ 



exp ■ 



-[/(n)-ag(n-s)] 2 
2a 2 

lf(n) -ag{n-s)] 2 



£ 



2a 2 



The usual practice in maximum-likelihood estimation 
theory is to find the values of the parameters which maxi- 
mize the natural logarithm of the likelihood fuction. In our 
case this logarithm is: 



log L = —N log a 



2a 2 ^ 



[/(n) — ag(n — s)] 2 + const. 



Let us introduce some notation which will facilitate the 
derivation from here on. Thus, let s/ and s g denote the stan- 
dard deviation of the analysed spectrum and the template: 



2 



-T 



Define C(s) as the cross-correlation function of / and g: 

c(s) = m , 

S f S 9 

where the normalization forces |C(s)| to be smaller then 
unity. Since C(s) is proportional to R(s), s can be defined 
as the shift which maximizes the cross-correlation function. 

Using these notations, the expressions for a and a re- 
duce to: 



s}[l 
Ms) 



C 2 (s)] 



Substituting these expressions again into the expression for 
log L, we get: 



logL 



C (s)] + const. 



(1) 



Thus, we see that the likelihood is an increasing mono- 
tonic function of the squared cross-correlation. The depen- 
dence on the square of the cross-correlation, instead of the 
cross-correlation itself, means that the likelihood rises for 
negative values of the correlation. This behaviour is related 
to the formal possibility that the scaling factor - a - be nega- 
tive. This has no meaning in the astrophysical context, since 
it means all the expected asborption lines in one component 
appear as emission lines. 



2.3 Error estimate 

Maximum-likelihood theory also includes a way to estimate 
the errors, or the confidence intervals, for the parameters. 
The covariance matrix of the parameters has been shown, 
at non-pathological cases, to be the negative inverse of the 
Hessi an matrix of the log-likel ihood function at its maximum 
(e.g., iKendall fc Stuardll967l) . Let us calculate this matrix 
and substitute the values at the maximum, treating the shift, 
s, as a continuous variable: 



where const, is independent of the parameters. As it turns 
out, the values of a, a and s which maximize this function 
are the solutions of the equations: 

n 

a ~ E„s 2 W 

and s is the value which maximizes the cross-covariance 
function, which is defined as: 
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where R (s) and R"{s) are the first and second derivatives of 
the cross-covariance function. The last two equations follow 
from the definition of s as the shift where the maximum 
correlation is achieved. 

Thus, we see that the Hessian matrix is, fortunately, 
diagonal and the inversion can be accomplished easily. The 
resulting squared error is: 

a 2 = ^ u r C"(g) C 2 (§) r l 

aNR"(s) L C(s) l-C 2 (s) J ' 

where C"(s) is the second derivative of the cross-correlation 
function. 

It is instructive to examine the above expression. It 
is factored into three separate factors: N, the number of 
bins, which has an inverse relation to the error, as is ex- 
pected intuitively; , which is a normalized measure of 
the sharpness of the cross-correlation peak, and it depends 
very weakly on the S/N. A larger absolute value of 
implies a sharper peak and therefore a smaller error, and 



vice versa. The third factor, 



is a measure of the 



"line" S/N ratio of the spectrum, i.e., the ratio between the 
signal standard-deviation and that of the noise. This last 
statement can be easily seen by the relation of 1 — C 2 (s) to 
the noise standard deviation. 

A different error estimate would result if the template is 
also taken to be a noi sy spectrum, as is sometime s assumed 
in the literature (e.g.. IVerschueren fc D avid 1999) and thus 
the procedure would have to estimate the 'true' template us- 
ing the spectrum and the template. The formal calculations 
are much more complicated in this case and the resulting 
error estimate is approximately y/2 times larger than the 
one quoted above. However, in the usual practice, where the 
same template is used for many observed spectra, the model 
presented here is more suitable, since the scatter of the esti- 
mated velocities would be influenced only by the noise in the 
spectra, whereas the noise in the template would contribute 
a systematic error. 

An important feature of maximum-likelihood estima- 
tion is that asymptotically (for large N), it produce s 
a "minimum-variance estimator" jKendall fe Stuardll967ft . 
This means we cannot expect to achieve an estimate 
with smaller errors. Since we have shown that the cross- 
correlation estimate of the shift is equivalent to a maximum- 
likelihood estimate, this shows the optimality of the cross- 
correlation method. It is important to bear in mind, though, 
that the statistical model we used was very simplistic, and 
other techniques may prove better for removing effects not 
included in this model, such a s spectral mi smatch or instru- 
mental long-term trends (e.g.. IChellill200oT) . 

2.4 Simulations 

The simulations presented here aim to test the above er- 
ror estimate under controlled conditions. All the simula- 
tions use a spectrum of the G-dwarf HD 38858, obtained by 
CORALIE, as part of a program to derive precis e abundances 
of planet-h osting and non-planet-hosting stars iSantos et alJ 
l200d l200ll) . To test the error estimate a single spectral or- 
der was used, ranging from 5 621 to 5 683 A, with 1 920 bins. 
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Figure 1. A single order from the observed multi-order spectrum 
of HD 38858. This order is the one used as a template for the error 
analysis simulation. 
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Figure 2. Three simulated spectra, using the template shown 
in Fig.0 The noise levels used for the simulation are 0.05 (top), 
0.1 (middle) and 0.2 (bottom). 



Fig. presents this template spectrum. In order to simu- 
late the above specified statistical model, care was taken 
to eliminate other effects which are not tested in this work. 
Thus, the assumed doppler shift was always 0, to avoid edge- 
effects, and the spectrum was rectified to a constant contin- 
uum level of unity. 

The noise added to the spectra was a simple Gaussian 
white noise, with standard deviations of 0.05, 0.1 or 0.2. 
Sample noisy spectra are shown in Fig. [5] These spectra 
were cross-correlated against the known template, resulting 
in the cross-correlation functions presented in Fig. |^1 The 
exact peak locations and the relevant derivatives were es- 
timated by interpolating with a parabola around the peak. 
Each simulation was repeated 10 000 times. Figure 0] shows 
the distribution of the ratio between the actual offset of 
the cross-correlation peak (Aw) and the error estimate (ct v ). 
The figure shows that this ratio is distributed as a zero- 
mean unit-variance Gaussian for all the noise-levels tested, 
as we would expect for a valid error estimate. The apparent 
Gaussian nature of the distribution, and especially its uni- 
modality, show the regularity of the measurement error and 
its estimate. In later sections we will use the same proce- 
dure to test the error estimate of the proposed combination 
scheme and expect to obtain similar distributions. 
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Figure 3. The cross-correlation functions between the spectra 
shown in Fig, Inland the template (Fig. 0. 
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Figure 4. Histograms of the ratio between the offset of the cross- 
correlation function (Aw) and the estimated error (cr„). The three 
panels correspond to the three noise levels used in Fig. 01 and 131 
The dashed lines represent a zero-mean unit-variance Gaussian 
distribution. 



3 COMBINING CROSS-CORRELATION 
FUNCTIONS 

3.1 The combination scheme 

Suppose we now have M separate spectra, each one cor- 
responding to a separate template, and all are assumed to 
have the same Doppler shift. These separate spectra can ei- 
ther be separate exposures of the same object, or separate 
orders of the same exposure. The following analysis assumes 
they all possess the same number of bins - N, although this 
assumption can be very easily relaxed. We do not assume 
the same S/N ratios for the various spectra nor the same 
scaling constants. 

The most simple scheme to combine cross-correlation 



functions into one single function is the straightforward, 
non- weighted average (SA): 



sa ^=mE^ s 



where M is the number of cross-correlation functions to be 
combined, while d(s) is the i-th cross-correlation, calcu- 
lated between the i-th spectrum and the i-th template. 

Another simple combination scheme is based on the so- 
called "determination coefficient", i.e., the squared correla- 
tion coefficient - C 2 (s) in our case. This quantity is com- 
monly used in regression analysis to describe the degree to 
whic h the v ariability in one variable is explained by the other 
fe.g.. lHavsll98ct) . and is supposed to be additive. Therefore, 
we may try to combine these determination coefficients (DC) 
to get: 

dc 2 ( s ) = ^]Tc 2 ( s ). 

i 

This value can be interpreted as a weighted sum of the cor- 
relations. Each term is weighted by itself, since its value - 
the correlation - is also a measure of its reliability. 

The likelihood approach can provide another scheme 
to combine cross-correlations. First, let us see what is the 
overall likelihood of the observations: 



exp ■ 



7^2 y^[fi(n)-aigi{n-s) 



E 



Converting, as usual to the logarithm, we get: 

logL = |at log cr* - E^ n ) - a *5*( n_s )] a | + 

Substituting the optimal values of cji and a% leads eventually 
to: 



const. 



logL = ~^Iog[l-C?( a )] + 



const. 



which is very similar to eq.0 Therefore we can equate these 
two expressions to get an "effective" correlation value - ML: 

NM log [l-ML 2 (s)] = ^AHog [l-Cf(s)] 



ML 2 ( S ) = 1- |n[l-C?( S )] | 



l/M 



As it turns out, for sufficiently small d{s) the above 
expression reduces to the expression of the "determination 
coefficient" mean - DC(s). On the other extreme, if one 
d{s) approaches unity (meaning it has a very high S/N), 
DC(s) will approach unity. This can be understood intu- 
itively, since the other cross-correlations probably represent 
much poorer S/N and therefore can be neglected. 

In order to obtain an error estimate for the shift, the 
Hessian of the likelihood has to be calculated and inverted. 
This time, the Hessian is not diagonal, but a laborious cal- 
culation yields the following expression for the error: 



J- r Af , r ML"(g) ML 2 (a) , 
a °--\ MN ML{S) 1-ML 2 («)J 
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Figure 5. Top: Cross-correlation function corresponding to a 
single order, with a noise level of 3.5. Middle: A simple average 
of the cross-correlations of 10 orders, all with a noise level of 3.5. 
Bottom: maximum-likelihood combination of the same 10 cross- 
correlations. 



which is remarkably similar to the expression for the error in 
the single spectrum case, except for using the total number 
of bins - MN instead of N\ 



3.2 Simulations 

In order to demonstrate the effect of combining cross- 
correlations, 10 spectral orders were used, taken from the 
same spectrum used in !2.4l covering the spectral range from 
5 189 to 5 683 A. As a first test, noise was added to these 10 
orders, with a very high standard deviation of 3.5. Figure 
presents the results of this test. The top panel shows a cross- 
correlation function corresponding to one of those orders. No 
prominent peak is present in the cross-correlation. The mid- 
dle panel shows a simple average of the ten cross-correlation 
functions (SA). The correct peak is evidently present. It is 
similarly present in the maximum-likelihood combined cor- 
relation - ML, as the bottom panel shows. Since the cross- 
correlation values are very small, DC and ML are virtually 
identical. This simulation demonstrates the power of com- 
bining cross-correlation functions in general. In this case no 
velocity could have been measured for any individual or- 
der, but the cumulative effect of the 10 orders produced a 
detectable peak. 

The next test shows the optimality of the maximum- 
likelihood scheme (ML) compared to the two averages SA 
and DC. This is evident mainly in cases where the S/N ra- 
tio is strongly varying among the analysed spectra. To show 
that, noise was added to the 10 orders with standard devia- 
tions of O.lz, where i is the order number. FigureElshows the 
distribution of the peak offset for each of the combination 
schemes, after repeating the simulation for 1000 times. In 
addition, the Figure also includes a histogram of the veloci- 
ties derived by a weighted average of the velocities obtained 
from each order separately (VA). The weighting was based 
on the individual error estimates. Obviously, VA can be cal- 
culated only if the correct peaks can be identified in the 
cross-correlation functions of all orders. The histogram cor- 
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Figure 6. The distribution of the peak location in each of the 
four combination schemes. See text for the details of the simula- 
tions. All four histograms are drawn on the same horizontal scale 
for the sake of comparison. 
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Figure 7. Histogram of the ratio between the offset of the 
maximum-likelihood combined correlations (A«) and the esti- 
mated error (u v ). The dashed line represent a zero- mean unit- 
variance Gaussian distribution. 



responding to ML is evidently less dispersed than all others. 
Numerically, the four standard deviations corresponding to 
SA, DC, VA and ML were 0.128, 0.078, 0.099 and 0.068 
kms -1 . 

An additional criterion to distinguish among combina- 
tion schemes is their ability to accentuate the correct peak 
relative to the spurious peaks. This ability is important for 
detection of faint objects, with low S/N. To test this abil- 
ity, the typical height of the spurious peaks was estimated 
by the standard deviation of the function in a region that is 
safely distant from the correct peak. Specifically, this stan- 
dard deviation was estimated on two segments of 500 bins, 
on both sides of the correct peak, at a distance of at least 
100 bins from it. The mean ratios of the peak height to this 
standard deviation were 36, 58 and 65 for SA, DC and ML, 
correspondingly. Once again, we see that ML provides the 
most prominent correct peak. 

The above tests were repeated in numerous other condi- 
tions, in all of them ML was found to be the optimal scheme 
to combine cross-correlations. 

After establishing the optimality of the maximum- 
likelihood combination scheme, its error estimate had to be 
validated, in a similar fashion to the validation done for the 
single spectrum case. Figure |7| shows the Av/a v calculated 
for the ML functions in the 1000 simulations of 10 orders. 
Once again, we see a very good agreement with a zero-mean 
unit-variance Gaussian. 
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4 CONCLUDING REMARKS 

This paper presents a novel approach for combining cross- 
correlation functions. This approach is based on the relation 
of cross-correlation to maximum-likelihood estimation. Un- 
like previously suggested approaches, this approach is not 
based on a linear combination of the individual functions. 
Previous works looked for the optimal li near combina tion, 
without considering other options (e.g., IConneslll985t) . In 
cases where the scaling constants and the signal-to-noise 
ratios are known, maximum-likelihood analysis will indeed 
lead to a linear combination. However, exact estimate of the 
signal-to- noise is difficult in real cases. The suggested com- 
bination does not use any such external estimate. 

Although the white gaussian noise model is assumed in 
most analyses of the cross-correlation, it is obvious that real- 
ity is much more complex. Thus, the noise level may depend 
on the intensity, like in Poissonian noise, or vary across the 
spectrum because of illumination effects. Additional com- 
plexities mainly include long-term trends and spectral mis- 
match of the template to the analysed spectrum. Attempts 
to optimize radial-velocity measurement to deal with these 
problems can be based also on maximum-likelihood theory. 

The suggested approach was already applied success- 
fully to the case of HD 41004, where 30 CORALIE or- 
ders were combined i n order to detect a very faint com- 
panion of a K star jZucker et alJ I2003T) . The combined 
functions were not the conventional cross-correlation func- 
tions, but two-dimension al correlation (TODCOR) functions 
iZucker fc Mazehl Il994h . The successful detection of the 
faint companion allowed a more detaile d solution of the sys - 
tem and confirmed earlier hypotheses of lSantos et alJ |2002). 
This case demonstrates the power of the technique to en- 
hance the capabilities of traditional cross-correlation and 
improve its ability to detect and measure very faint objects, 
with very low S/N. 
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